Overview

Dataset statistics

Number of variables9
Number of observations500
Missing cells40
Missing cells (%)0.9%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory35.3 KiB
Average record size in memory72.3 B

Variable types

Numeric7
Categorical2

Alerts

GRE Score is highly overall correlated with TOEFL Score and 4 other fieldsHigh correlation
TOEFL Score is highly overall correlated with GRE Score and 6 other fieldsHigh correlation
SOP is highly overall correlated with TOEFL Score and 4 other fieldsHigh correlation
LOR is highly overall correlated with TOEFL Score and 3 other fieldsHigh correlation
CGPA is highly overall correlated with GRE Score and 5 other fieldsHigh correlation
Chance of Admit is highly overall correlated with GRE Score and 6 other fieldsHigh correlation
Research is highly overall correlated with GRE Score and 3 other fieldsHigh correlation
University Rating is highly overall correlated with GRE Score and 5 other fieldsHigh correlation
GRE Score has 15 (3.0%) missing valuesMissing
TOEFL Score has 10 (2.0%) missing valuesMissing
University Rating has 15 (3.0%) missing valuesMissing
Serial No. is uniformly distributedUniform
Serial No. has unique valuesUnique

Reproduction

Analysis started2022-12-11 22:33:43.089306
Analysis finished2022-12-11 22:33:53.566790
Duration10.48 seconds
Software versionpandas-profiling vv3.5.0
Download configurationconfig.json

Variables

Serial No.
Real number (ℝ)

UNIFORM
UNIQUE

Distinct500
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean250.5
Minimum1
Maximum500
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.0 KiB
2022-12-12T04:03:53.687464image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile25.95
Q1125.75
median250.5
Q3375.25
95-th percentile475.05
Maximum500
Range499
Interquartile range (IQR)249.5

Descriptive statistics

Standard deviation144.48183
Coefficient of variation (CV)0.57677378
Kurtosis-1.2
Mean250.5
Median Absolute Deviation (MAD)125
Skewness0
Sum125250
Variance20875
MonotonicityStrictly increasing
2022-12-12T04:03:53.860052image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 1
 
0.2%
330 1
 
0.2%
343 1
 
0.2%
342 1
 
0.2%
341 1
 
0.2%
340 1
 
0.2%
339 1
 
0.2%
338 1
 
0.2%
337 1
 
0.2%
336 1
 
0.2%
Other values (490) 490
98.0%
ValueCountFrequency (%)
1 1
0.2%
2 1
0.2%
3 1
0.2%
4 1
0.2%
5 1
0.2%
6 1
0.2%
7 1
0.2%
8 1
0.2%
9 1
0.2%
10 1
0.2%
ValueCountFrequency (%)
500 1
0.2%
499 1
0.2%
498 1
0.2%
497 1
0.2%
496 1
0.2%
495 1
0.2%
494 1
0.2%
493 1
0.2%
492 1
0.2%
491 1
0.2%

GRE Score
Real number (ℝ)

HIGH CORRELATION
MISSING

Distinct49
Distinct (%)10.1%
Missing15
Missing (%)3.0%
Infinite0
Infinite (%)0.0%
Mean316.55876
Minimum290
Maximum340
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.0 KiB
2022-12-12T04:03:54.016911image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum290
5-th percentile298
Q1308
median317
Q3325
95-th percentile335
Maximum340
Range50
Interquartile range (IQR)17

Descriptive statistics

Standard deviation11.274704
Coefficient of variation (CV)0.035616466
Kurtosis-0.68446669
Mean316.55876
Median Absolute Deviation (MAD)8
Skewness-0.051686583
Sum153531
Variance127.11896
MonotonicityNot monotonic
2022-12-12T04:03:54.179270image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=49)
ValueCountFrequency (%)
324 22
 
4.4%
312 22
 
4.4%
322 17
 
3.4%
327 17
 
3.4%
316 17
 
3.4%
321 17
 
3.4%
320 16
 
3.2%
314 16
 
3.2%
311 16
 
3.2%
325 15
 
3.0%
Other values (39) 310
62.0%
ValueCountFrequency (%)
290 2
 
0.4%
293 1
 
0.2%
294 2
 
0.4%
295 5
1.0%
296 5
1.0%
297 6
1.2%
298 10
2.0%
299 8
1.6%
300 12
2.4%
301 10
2.0%
ValueCountFrequency (%)
340 9
1.8%
339 3
 
0.6%
338 4
0.8%
337 2
 
0.4%
336 5
1.0%
335 4
0.8%
334 7
1.4%
333 4
0.8%
332 7
1.4%
331 9
1.8%

TOEFL Score
Real number (ℝ)

HIGH CORRELATION
MISSING

Distinct29
Distinct (%)5.9%
Missing10
Missing (%)2.0%
Infinite0
Infinite (%)0.0%
Mean107.18776
Minimum92
Maximum120
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.0 KiB
2022-12-12T04:03:54.352581image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum92
5-th percentile98
Q1103
median107
Q3112
95-th percentile118
Maximum120
Range28
Interquartile range (IQR)9

Descriptive statistics

Standard deviation6.1128994
Coefficient of variation (CV)0.057029829
Kurtosis-0.66456537
Mean107.18776
Median Absolute Deviation (MAD)5
Skewness0.10206773
Sum52522
Variance37.367539
MonotonicityNot monotonic
2022-12-12T04:03:54.488637image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=29)
ValueCountFrequency (%)
110 42
 
8.4%
105 37
 
7.4%
104 29
 
5.8%
107 28
 
5.6%
112 27
 
5.4%
106 26
 
5.2%
103 25
 
5.0%
102 24
 
4.8%
100 24
 
4.8%
99 22
 
4.4%
Other values (19) 206
41.2%
ValueCountFrequency (%)
92 1
 
0.2%
93 2
 
0.4%
94 2
 
0.4%
95 3
 
0.6%
96 6
 
1.2%
97 7
 
1.4%
98 10
2.0%
99 22
4.4%
100 24
4.8%
101 19
3.8%
ValueCountFrequency (%)
120 9
 
1.8%
119 10
 
2.0%
118 10
 
2.0%
117 8
 
1.6%
116 16
3.2%
115 11
2.2%
114 18
3.6%
113 18
3.6%
112 27
5.4%
111 20
4.0%

University Rating
Categorical

HIGH CORRELATION
MISSING

Distinct5
Distinct (%)1.0%
Missing15
Missing (%)3.0%
Memory size4.0 KiB
3.0
154 
2.0
124 
4.0
103 
5.0
72 
1.0
32 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters1455
Distinct characters7
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row4.0
2nd row4.0
3rd row3.0
4th row3.0
5th row2.0

Common Values

ValueCountFrequency (%)
3.0 154
30.8%
2.0 124
24.8%
4.0 103
20.6%
5.0 72
14.4%
1.0 32
 
6.4%
(Missing) 15
 
3.0%

Length

2022-12-12T04:03:54.641521image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2022-12-12T04:03:54.792730image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
3.0 154
31.8%
2.0 124
25.6%
4.0 103
21.2%
5.0 72
14.8%
1.0 32
 
6.6%

Most occurring characters

ValueCountFrequency (%)
. 485
33.3%
0 485
33.3%
3 154
 
10.6%
2 124
 
8.5%
4 103
 
7.1%
5 72
 
4.9%
1 32
 
2.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 970
66.7%
Other Punctuation 485
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 485
50.0%
3 154
 
15.9%
2 124
 
12.8%
4 103
 
10.6%
5 72
 
7.4%
1 32
 
3.3%
Other Punctuation
ValueCountFrequency (%)
. 485
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 1455
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
. 485
33.3%
0 485
33.3%
3 154
 
10.6%
2 124
 
8.5%
4 103
 
7.1%
5 72
 
4.9%
1 32
 
2.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1455
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
. 485
33.3%
0 485
33.3%
3 154
 
10.6%
2 124
 
8.5%
4 103
 
7.1%
5 72
 
4.9%
1 32
 
2.2%

SOP
Real number (ℝ)

Distinct9
Distinct (%)1.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.374
Minimum1
Maximum5
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.0 KiB
2022-12-12T04:03:54.927292image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1.5
Q12.5
median3.5
Q34
95-th percentile5
Maximum5
Range4
Interquartile range (IQR)1.5

Descriptive statistics

Standard deviation0.99100362
Coefficient of variation (CV)0.29371773
Kurtosis-0.70571695
Mean3.374
Median Absolute Deviation (MAD)0.5
Skewness-0.2289724
Sum1687
Variance0.98208818
MonotonicityNot monotonic
2022-12-12T04:03:55.053306image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=9)
ValueCountFrequency (%)
4 89
17.8%
3.5 88
17.6%
3 80
16.0%
2.5 64
12.8%
4.5 63
12.6%
2 43
8.6%
5 42
8.4%
1.5 25
 
5.0%
1 6
 
1.2%
ValueCountFrequency (%)
1 6
 
1.2%
1.5 25
 
5.0%
2 43
8.6%
2.5 64
12.8%
3 80
16.0%
3.5 88
17.6%
4 89
17.8%
4.5 63
12.6%
5 42
8.4%
ValueCountFrequency (%)
5 42
8.4%
4.5 63
12.6%
4 89
17.8%
3.5 88
17.6%
3 80
16.0%
2.5 64
12.8%
2 43
8.6%
1.5 25
 
5.0%
1 6
 
1.2%

LOR
Real number (ℝ)

Distinct9
Distinct (%)1.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.484
Minimum1
Maximum5
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.0 KiB
2022-12-12T04:03:55.193200image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile2
Q13
median3.5
Q34
95-th percentile5
Maximum5
Range4
Interquartile range (IQR)1

Descriptive statistics

Standard deviation0.92544957
Coefficient of variation (CV)0.26562847
Kurtosis-0.74574851
Mean3.484
Median Absolute Deviation (MAD)0.5
Skewness-0.14529031
Sum1742
Variance0.85645691
MonotonicityNot monotonic
2022-12-12T04:03:55.322331image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=9)
ValueCountFrequency (%)
3 99
19.8%
4 94
18.8%
3.5 86
17.2%
4.5 63
12.6%
2.5 50
10.0%
5 50
10.0%
2 46
9.2%
1.5 11
 
2.2%
1 1
 
0.2%
ValueCountFrequency (%)
1 1
 
0.2%
1.5 11
 
2.2%
2 46
9.2%
2.5 50
10.0%
3 99
19.8%
3.5 86
17.2%
4 94
18.8%
4.5 63
12.6%
5 50
10.0%
ValueCountFrequency (%)
5 50
10.0%
4.5 63
12.6%
4 94
18.8%
3.5 86
17.2%
3 99
19.8%
2.5 50
10.0%
2 46
9.2%
1.5 11
 
2.2%
1 1
 
0.2%

CGPA
Real number (ℝ)

Distinct184
Distinct (%)36.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean8.57644
Minimum6.8
Maximum9.92
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.0 KiB
2022-12-12T04:03:55.488012image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum6.8
5-th percentile7.638
Q18.1275
median8.56
Q39.04
95-th percentile9.6
Maximum9.92
Range3.12
Interquartile range (IQR)0.9125

Descriptive statistics

Standard deviation0.6048128
Coefficient of variation (CV)0.070520263
Kurtosis-0.5612784
Mean8.57644
Median Absolute Deviation (MAD)0.46
Skewness-0.026612517
Sum4288.22
Variance0.36579852
MonotonicityNot monotonic
2022-12-12T04:03:55.646177image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
8.76 9
 
1.8%
8 9
 
1.8%
8.56 7
 
1.4%
8.12 7
 
1.4%
8.45 7
 
1.4%
8.54 7
 
1.4%
8.66 6
 
1.2%
8.65 6
 
1.2%
8.64 6
 
1.2%
8.5 6
 
1.2%
Other values (174) 430
86.0%
ValueCountFrequency (%)
6.8 1
0.2%
7.2 1
0.2%
7.21 1
0.2%
7.23 1
0.2%
7.25 1
0.2%
7.28 1
0.2%
7.3 1
0.2%
7.34 2
0.4%
7.36 1
0.2%
7.4 1
0.2%
ValueCountFrequency (%)
9.92 1
 
0.2%
9.91 1
 
0.2%
9.87 2
0.4%
9.86 1
 
0.2%
9.82 1
 
0.2%
9.8 3
0.6%
9.78 1
 
0.2%
9.76 2
0.4%
9.74 1
 
0.2%
9.7 2
0.4%

Research
Categorical

Distinct2
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size4.0 KiB
1
280 
0
220 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters500
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row0

Common Values

ValueCountFrequency (%)
1 280
56.0%
0 220
44.0%

Length

2022-12-12T04:03:55.804413image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2022-12-12T04:03:55.938285image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
1 280
56.0%
0 220
44.0%

Most occurring characters

ValueCountFrequency (%)
1 280
56.0%
0 220
44.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 500
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 280
56.0%
0 220
44.0%

Most occurring scripts

ValueCountFrequency (%)
Common 500
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 280
56.0%
0 220
44.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 500
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 280
56.0%
0 220
44.0%

Chance of Admit
Real number (ℝ)

Distinct61
Distinct (%)12.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.72174
Minimum0.34
Maximum0.97
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.0 KiB
2022-12-12T04:03:56.071165image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0.34
5-th percentile0.47
Q10.63
median0.72
Q30.82
95-th percentile0.94
Maximum0.97
Range0.63
Interquartile range (IQR)0.19

Descriptive statistics

Standard deviation0.1411404
Coefficient of variation (CV)0.19555575
Kurtosis-0.4546818
Mean0.72174
Median Absolute Deviation (MAD)0.1
Skewness-0.28996621
Sum360.87
Variance0.019920614
MonotonicityNot monotonic
2022-12-12T04:03:56.245414image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.71 23
 
4.6%
0.64 19
 
3.8%
0.73 18
 
3.6%
0.79 16
 
3.2%
0.72 16
 
3.2%
0.78 15
 
3.0%
0.76 14
 
2.8%
0.7 13
 
2.6%
0.68 13
 
2.6%
0.8 13
 
2.6%
Other values (51) 340
68.0%
ValueCountFrequency (%)
0.34 2
 
0.4%
0.36 2
 
0.4%
0.37 1
 
0.2%
0.38 2
 
0.4%
0.39 1
 
0.2%
0.42 4
0.8%
0.43 1
 
0.2%
0.44 3
0.6%
0.45 3
0.6%
0.46 5
1.0%
ValueCountFrequency (%)
0.97 4
 
0.8%
0.96 8
1.6%
0.95 5
 
1.0%
0.94 13
2.6%
0.93 12
2.4%
0.92 9
1.8%
0.91 10
2.0%
0.9 9
1.8%
0.89 11
2.2%
0.88 4
 
0.8%

Interactions

2022-12-12T04:03:51.311970image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-12T04:03:43.704738image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-12T04:03:44.902874image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-12T04:03:46.123958image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-12T04:03:47.342214image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-12T04:03:48.605691image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-12T04:03:49.954658image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-12T04:03:51.484028image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-12T04:03:43.858338image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-12T04:03:45.067400image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-12T04:03:46.292505image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-12T04:03:47.521482image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-12T04:03:48.782936image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-12T04:03:50.164636image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-12T04:03:51.645834image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-12T04:03:44.025508image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-12T04:03:45.229390image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-12T04:03:46.457380image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-12T04:03:47.689212image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-12T04:03:48.959756image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-12T04:03:50.341274image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-12T04:03:51.853543image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-12T04:03:44.184794image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-12T04:03:45.392911image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-12T04:03:46.618297image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-12T04:03:47.852548image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-12T04:03:49.132959image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-12T04:03:50.502806image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-12T04:03:52.069359image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-12T04:03:44.369952image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-12T04:03:45.584400image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-12T04:03:46.794958image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-12T04:03:48.041299image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-12T04:03:49.344291image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-12T04:03:50.729844image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-12T04:03:52.265053image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-12T04:03:44.555456image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-12T04:03:45.774889image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-12T04:03:46.990435image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-12T04:03:48.224102image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-12T04:03:49.549145image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-12T04:03:50.915088image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-12T04:03:52.726266image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-12T04:03:44.730988image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-12T04:03:45.942443image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-12T04:03:47.180146image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-12T04:03:48.422982image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-12T04:03:49.768783image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-12T04:03:51.085731image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Correlations

2022-12-12T04:03:56.523898image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Auto

The auto setting is an interpretable pairwise column metric of the following mapping:
  • Variable_type-Variable_type : Method, Range
  • Categorical-Categorical : Cramer's V, [0,1]
  • Numerical-Categorical : Cramer's V, [0,1] (using a discretized numerical column)
  • Numerical-Numerical : Spearman's ρ, [-1,1]
The number of bins used in the discretization for the Numerical-Categorical column pair can be changed using config.correlations["auto"].n_bins. The number of bins affects the granularity of the association you wish to measure.

This configuration uses the recommended metric for each pair of columns.
2022-12-12T04:03:56.778957image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-12-12T04:03:57.033689image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-12-12T04:03:57.307360image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-12-12T04:03:57.533120image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2022-12-12T04:03:57.847325image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-12-12T04:03:52.984331image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
A simple visualization of nullity by column.
2022-12-12T04:03:53.217497image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2022-12-12T04:03:53.448893image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

Serial No.GRE ScoreTOEFL ScoreUniversity RatingSOPLORCGPAResearchChance of Admit
01337.0118.04.04.54.59.6510.92
12324.0107.04.04.04.58.8710.76
23NaN104.03.03.03.58.0010.72
34322.0110.03.03.52.58.6710.80
45314.0103.02.02.03.08.2100.65
56330.0115.05.04.53.09.3410.90
67321.0109.0NaN3.04.08.2010.75
78308.0101.02.03.04.07.9000.68
89302.0102.01.02.01.58.0000.50
910323.0108.03.03.53.08.6000.45
Serial No.GRE ScoreTOEFL ScoreUniversity RatingSOPLORCGPAResearchChance of Admit
490491307.0105.02.02.54.58.1210.67
491492297.099.04.03.03.57.8100.54
492493298.0101.04.02.54.57.6910.53
493494300.095.02.03.01.58.2210.62
494495301.099.03.02.52.08.4510.68
495496332.0108.05.04.54.09.0210.87
496497337.0117.05.05.05.09.8710.96
497498330.0120.05.04.55.09.5610.93
498499312.0103.04.04.05.08.4300.73
499500327.0113.04.04.54.59.0400.84